Minimizing the stochasticity of halos in large-scale structure surveys 



Nico Hamaus,^'0 Uros Seljak,^' ^' '^'0 Vincent Desjacques/ Robert E. Smith, ^ and Tobias Baldauf^ 

^Institute for Theoretical Physics, University of Zurich, 8057 Zurich, Switzerland 
^Physics Department, Astronomy Department and Lawrence Berkeley National Laboratory, 
University of California, Berkeley, California 94720, USA 
'^Ewha University, Seoul 120-750, S. Korea 
(Dated: August 17, 2010) 

In recent work (Seljak, Hamaus and Desjacques 2009) it was found that weighting central halo 
galaxies by halo mass can significantly suppress their stochasticity relative to the dark matter, well 
below the Poisson model expectation. This is useful for constraining relations between galaxies 
and the dark matter, such as the galaxy bias, especially in situations where sampling variance 
errors can be eliminated. In this paper we extend this study with the goal of finding the optimal 
mass-dependent halo weighting. We use A'^-body simulations to perform a general analysis of halo 
stochasticity and its dependence on halo mass. We investigate the stochasticity matrix, defined as 
Cij = {{5i — bi6m){Sj — bjSm)), where 5m. is the dark matter overdensity in Fourier space. Si the 
halo overdensity of the i-th halo mass bin, and bi the corresponding halo bias. In contrast to the 
Poisson model predictions we detect nonvanishing correlations between different mass bins. We 
also find the diagonal terms to be sub-Poissonian for the highest-mass halos. The diagonalization 
of this matrix results in one large and one low eigenvalue, with the remaining eigenvalues close to 
the Poisson prediction 1/n, where n is the mean halo number density. The eigenmode with the 
lowest eigenvalue contains most of the information and the corresponding eigenvector provides an 
optimal weighting function to minimize the stochasticity between halos and dark matter. We find 
this optimal weighting function to match linear mass weighting at high masses, while at the low- 
mass end the weights approach a constant whose value depends on the low-mass cut in the halo 
mass function. This weighting further suppresses the stochasticity as compared to the previously 
explored mass weighting. Finally, we employ the halo model to derive the stochasticity matrix and 
the scale-dependent bias from an analytical perspective. It is remarkably successful in reproducing 
our numerical results and predicts that the stochasticity between halos and the dark matter can be 
reduced further when going to halo masses lower than we can resolve in current simulations. 

PACS numbers: 98.80, 98.65, 98.62 



I. INTRODUCTION 

The large-scale structure (LSS) of the Universe carries 
a wealth of information about the physics that governs 
cosmological evolution. By measuring LSS we can at- 
tempt to answer such fundamental questions as what the 
Universe is made of, what the initial conditions for the 
structure in the Universe were, and what its future will 
be. Traditionally, the easiest way to observe it is by mea- 
suring galaxy positions and redshifts, which provides the 
3D spatial distribution of LSS via so-called redshift sur- 
veys (e.g., fli). 

However, dark matter dominates the evolution and 
relation to fundamental cosmological parameters, while 
galaxies are only biased, stochastic tracers of this under- 
lying density field. On large scales, this bias is expected 
to be a constant offset in clustering amplitude relative 
to the dark matter, which can be removed to reconstruct 
the dark matter power spectrum Nevertheless, this 
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reconstruction is hampered due to a certain degree of 
randomness in the distribution of galaxies, which is based 
on the nonlinear and stochastic relation between galax- 
ies and the dark matter. In the simplest model one de- 
scribes this stochasticity with the Poisson model of shot 
noise. Shot noise constitutes a source of error in the 
power spectrum Q and therefore limits the accuracy of 
cosmological constraints. The Poisson model predicts it 
to be determined by the inverse of the galaxy number 
density, assuming galaxies to be random and pointlike 
tracers. However, galaxies are born inside dark matter 
halos and for these extended, gravitationally interacting 
objects, the shot noise model is harder to describe. It 
is thus desirable to develop estimators that are least af- 
fected by this source of stochasticity. 

In Fourier space the stochasticity of galaxies is usually 
described by the cross- correlation coefficient 



y PggPinni 

where Pgg is the measured galaxy autopower spectrum, 
Pmm the dark matter autopower spectrum, and Pgm the 
cross-power spectrum of the two components. The cross- 
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correlation coefficient can be related to the shot noise 
power cr^, which is commonly defined via the decompo- 



sition Pgg = Pgg 



a 



with Pgg = VP^m 



defined as 6 = PgmlPmm- This yields 



Po. 



and the bias 
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Thus, the lower the shot noise, the smaller the stochastic- 
ity, i.e., the deviation of the cross-correlation coefficient 
from unity. Minimizing this stochasticity is important if 
one attempts to determine the relation between galaxies 
and the underlying dark matter. One example of such an 
application is correlating the weak lensing signal, which 
traces dark matter, to properly radially weighted galax- 
ies Q : an accurate determination of the galaxy bias can 
be combined with a 3-dimensional galaxy redshift sur- 
vey to greatly reduce the statistical errors relative to the 
corresponding 2-dimensional weak lensing survey. 

The ultimate precision on how accurate the galaxy bias 
can be estimated from such methods is determined by the 
cross-correlation coefficient and previous work has shown 
that it can deviate significantly from unity for uniformly 
weighted galaxies or halos jj]. However, it was demon- 
strated recently that weighting halos by mass consider- 
ably reduces the stochasticity between halos and the dark 
matter [gj. The purpose of this paper is to explore this 
more systematically and to develop an optimal weighting 
method that achieves the smallest possible stochasticity. 

Our definition of the shot noise above is relevant for 
the methods that attempt to cancel sampling variance (or 
cosmic variance) ^^'^ this will be our primary moti- 
vation in this paper. Alternatively, the shot noise is often 
associated with its contribution to the error in the power 
spectrum determination, this error usually being decom- 
posed into sampling variance and shot noise. Sampling 
variance refers to the fact that in a given volume V the 
number Nk of observable Fourier modes of a given wave 
vector amplitude is finite. In the case of a Gaussian ran- 
dom field the relative error in the measured galaxy power 
spectrum Pgg due to the sum of the two errors is given 
by CTp /Pgg = l/\/Nk (each complex Fourier mode has 
two independent realizations and we only count modes 
with positive wave vector components). Using the above 
decomposition of the measured power Pgg into intrinsic 
power Pgg and shot noise ct^, one finds 
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This definition is ambiguous, since it leaves the decom- 
position of the measured power into shot noise and shot 
noise subtracted power unspecified. Most of the analy- 
ses so far have simply assumed the Poisson model, where 
the shot noise is given by the inverse of the number den- 
sity of galaxies, — 1/n. A second possibility is to 



define the shot noise such that the galaxy bias estima- 
tor Pgg/Pmm becomes as scale independent as possible. 
The third way is to define it via the stochasticity between 
halos and the dark matter, i.e., the cross-correlation coef- 
ficient Tgm as in Eq. We choose the third definition, 
but will comment on the relations to the other two meth- 
ods as well. 

It is important to emphasize here that the first two def- 
initions are not directly related to the applications where 
the sampling variance error can be eliminated, since they 
do not include correlations between tracers (where the 
dark matter itself can also be seen as a tracer). While 
in this paper we focus on minimizing the error on the 
bias estimation using the sampling variance canceling 
method, there are other applications where correlating 
dark matter and galaxies, or two differently biased galaxy 
samples, allows us to reduce the sampling variance error 
[^-Q . In such cases the stochasticity, or the shot noise to 
power ratio as defined in Eq. ^ , is the dominant source 
of error and methods capable of reducing it offer the po- 
tential to further advance the precision of cosmological 
tests. Indeed, since the error on the power spectrum as in 
Eq. ([3]) contains two contributions, in the past there was 
not much interest in investigating the situation where the 
shot noise is much smaller than sampling variance. It is 
the situations where the sampling variance error vanishes 
that are most relevant for our study. 

In this paper we will focus on the relation between ha- 
los and the underlying dark matter, using two-point cor- 
relations in Fourier space (i.e. the power spectrum) as a 
statistical estimator. A further step to connect halos to 
observations of galaxies can be accomplished by specifi- 
cation of a halo occupation distribution for galaxies [l^ , 
but we do not investigate this in any detail. Alterna- 
tively, one can think of the halos as a sample of central 
halo galaxies from which satellites have been removed. 



II. SHOT NOISE MATRIX 

The term shot noise is usually related to the fact that 
the sampling of a continuous field with a finite number of 
objects yields a spurious contribution of power to its au- 
topower spectrum. In the Poisson model the contribution 
to the autopower spectrum is where n = N/V is the 
mean number density of objects sampling the continuous 
field, whereas the cross-power spectrum of two distinct 
samples of objects is not affected (see, e.g., [HI, fl^). 
However, in cosmology one studies galaxies residing in 
dark matter halos, which are not a random subsample of 
the dark matter particles. The Poisson model does not 
account for that fact. 

In recent work it has been argued that there are other 
nonlinear terms that appear like white noise terms in the 
power spectrum of halos and so a mo re g eneral approach 
is needed to determine the shot noise 11311 . In order to ac- 
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count for this fact we define the shot noise more generally 
as the two-point correlation matrix 



III. SIMULATIONS 



{{Si - bi6jn){Sj 
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Here the subscripts i and j refer to specific subsamples 
of the halo density field with overdensities 6i and 6j and 
corresponding scale- independent bias bi and bj, respec- 
tively. The dark matter density fiuctuation is denoted by 
Sjyi and the angled brackets denote an ensemble average. 
We work in Fourier space and the (5's are the complex 
Fourier components of the density field, 
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However, we handle the complex density modes 6 as real 
quantities, since their real and imaginary parts are un- 
correlated and one can treat them as two independent 
modes. Further, we assume the overdensity of a particu- 
lar halo sample i to be composed of two terms E4!|: 



Si = bid J, 



(6) 



where is a random variable of zero mean assumed to be 
uncorrelated with the signal, i.e., {tiSm) = 0. It follows 
that the bias parameter bi can be obtained from cross 
correlation with the dark matter, 



br = 



(7) 



and the shot noise matrix can be written as Cij = {^i^j)- 
With these definitions the cross-correlation coefficient be- 
tween any given halo bin i and the dark matter. 
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becomes unity when we subtract the shot noise com- 
ponent Cii from {5f). We thus recover the shot noise 
definition from Eq. ^ and define Pi^ = {Sf) - Cti to 
be the halo autopower spectrum (shot noise subtracted) , 
Pirn — {SiSm) the halo-mattcr cross-power spectrum, and 
Pmm = (^m) ^he matter autopower spectrum (we assume 
the shot noise of the matter auto-, as well as the halo- 
matter cross-power spectrum to vanish) . Note that these 
relations are still self-consistent if we allow the bias bi to 
be scale dependent. Here we will, however, explore the 
simpler case assuming scale-independent bias, which is a 
good approximation on large scales. 

In the Poisson model the shot noise matrix Cij is di- 
agonal, but this is not necessarily the case in our defi- 
nition. The objective of this paper is to study all the 
components of this matrix using A^-body simulations. In 
particular we divide the halos into bins of different mass, 
but equal number density. A diagonalization of the shot 
noise matrix will then provide its eigenvalues and eigen- 
vectors, which contain important information about the 
stochastic properties of the halo density field. 



We use the zHORizON simulations [12|, 30 realizations 
of numerical iV-body simulations with 750"^ particles of 
mass 5.55 x IO^^/i^^Mq and a box-size of 1.5/i~^Gpc (to- 
tal effective volume of Vtot = 101.25/i^'^Gpc'^) to accu- 
rately sample the density field of cold dark matter. The 
simulations were performed at the University of Zurich 
supercomputers ZBOX2 and ZBOx3 with the gadget ii 
code [13]. We chose the cosmological parameters to be 
close to the outcome of the WMAP5 data release [Tsj . 
namely = 0.25, r^A = 0.75, = 0.04, erg = 0.8, 
Us = 1.0 and h = 0.7. The transfer function was com- 
puted with the CMBFAST code [31 and the initial condi- 
tions were set up at redshift z = 50 with the 2lpt initial 
conditions generator [l3, [3 • 

We applied the friends-of-friends (FoF) algorithm B- 
FOF by V. Springel with a linking length of 20% of 
the mean interparticle distance and a minimum of 30 
particles per halo to generate halo catalogs. The re- 
sulting catalogs contain about 1.3 x 10^ halos {n ~ 
3.7 X 10~^h'^Mpc~'^) with masses between Mmin — 1-1 x 
10"/i-iMq and AW =i 3.1 x lOi^/i-iMg. In order 
to investigate the infiuence of the mass resolution on 
our results, we employ another set of 5 iV-body simu- 
lations [3] of box-size 1.6/i^^Gpc with 1024"^ particles of 
mass 3.0 x lO^^/i^^M0, resolving halos down to My^in — 
5.9x1012/i-1Mq (n ~ 7.0xlO-4/i3Mpc-3). AU other pa- 
rameters of this simulation are similar to the one above, 
namely Qm = 0.279, flA = 0.721, fib = 0.046, as = 0.81, 
Us = 0.96, h = 0.7. One further realization with these 
parameters was generated with an even higher mass res- 
olution, namely 1536"^ particles of mass 4.7 x 10"'^°/i~^Mq 
in a box of 1.3/i^^Gpc, resolving halos down to Mmi„ ~ 
9.4 X lO"/i-iM0 {n ~ 4.0 x IQ-^h^Mpc-^). 

The density fields of dark matter and halos in con- 
figuration space were computed via interpolation of the 
particles onto a cubical mesh with 512"^ grid points using 
a cloud- in-cell mesh assignment algorithm 20] . We then 
applied fast Fourier transforms to compute the modes 
of the fields in fc-space. All our results are presented at 
z — and we do not explore the redshift dependence, 
because at higher redshifts the halo number density is 
lower and we wish to explore the stochastic properties of 
halos in the high density limit. 



IV. ANALYSIS 

A. Estimators for the binned halo density field 

The shot noise matrix from Eq. ^ is calculated by 
plugging in the Fourier modes provided by our simula- 
tions and averaging over a range of wave numbers. The 
bias is determined via the ratio in Eq. ([7]) , we thus neglect 
any shot noise contribution in this expression. In Eq. (jU 
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we use the scale-independent bias, which is obtained by 
averaging over the range k < 0.024 /iMpc~^, correspond- 
ing to our first four /c-bins. This range of wave numbers 
is least affected by scale dependence, as apparent from 
the middle panel of Fig. [1] 

For the division into subsamples we bin the full halo 
catalog into bins of different mass, keeping the number 
density of each bin constant. This is done by sorting 
the halos according to their mass and then dividing this 
sorted array into subarrays with an equal number of ha- 
los. We use 10 bins for most of the plots presented here, 
since more bins make them increasingly hard to read. 
For some plots we also show the results with 30 and 100 
bins to provide a more accurate sampling of halo masses. 
A convergence of the results can only be reached with 
infinitely many bins, which is numerically impossible to 
accomplish. However, using linear mass weighting of the 
halos within each bin makes the results converge faster, 
as will be justified later. We apply this technique to our 
100 halo mass bins, as shown in some of the following 
plots. 



1. Power spectrum, btas and cross-correlation coefficient 

We start by looking at the autopower spectrum of the 
halos in each mass bin as shown in the top panel of Fig. [1] 
using our lower resolution simulation with average halo 
number density of n ~ 3.7 x 10~^h^Mpc~'^ . The halo 
autopower spectra have been subtracted by Cu, the di- 
agonal elements of the shot noise matrix from Eq. (|4]), 
depicted below in Fig. [51 The halo subsamples increas- 
ingly gain power with higher mass due to their enhanced 
bias, which is plotted in the middle panel of Fig. [TJ 
This plot shows the bias obtained from Eq. (O as a 
function of k. The scale-independent bias is drawn as 
straight dotted lines for comparison. On large scales, 
roughly below k ~ 0.015 /iMpc~^, sampling variance 
makes the curves appear more noisy, while on smaller 
scales, k > 0.04 /iMpc~^, possibly nonlinear evolution 
of the density field or higher-order bias corrections set 
in causing the halo bias to pick up a scale dependence 
[2]| . This scale dependence is most pronounced for the 
highest-mass bin. 

The degree of halo stochasticity can also be assessed 
in the cross-correlation coefficient between halos and the 
dark matter, as depicted in the bottom panel of Fig. [TJ 
We see that the more massive halos are a less stochastic 
tracer of the dark matter. Note that subtracting our def- 
inition of the shot noise from the autocorrelation of halos 
makes the cross-correlation coefficient become unity. It 
has the nice property that the bias determined from halo 
auto-correlation and from halo-matter cross-correlation 
is identical by definition. 
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FIG. 1: TOP: Autopower spectra for 10 consecutive halo mass 
bins (solid colored lines) and the dark matter (dotted black 
line). MIDDLE: Bias of the 10 halo bins determined from the 
cross power with the dark matter, the dotted lines show the 
scale-independent bias. BOTTOM: Cross-correlation coeffi- 
cients of the 10 halo bins with the dark matter (solid colored 
lines) without shot noise subtraction. When the shot noise 
di is subtracted from {Sf), by definition the cross-correlation 
coefficient becomes unity (dashed lines). For reference, the 
value r = 1 is plotted (dotted black line). The error bars on 
all three plots were computed from the ensemble of the 30 
independent realizations of our simulations. They show the 
standard deviation on the mean of each quantity shown. 
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2. Shot noise matrix 

We now turn to the calculation of the shot noise matrix 
for the 10 halo mass bins. Figure [5] shows each element of 
Cij plotted against the wave number. As expected, the 
diagonal components of the shot noise (solid lines) are 
dominant. They all show essentially no scale dependence 
and match the usual expectation of 1 /fii very well (where 
hi is the mean number density of halos in bin i), except 
for the highest-mass bin (solid, black line), which is sup- 
pressed by about a factor of 2. The conventional expres- 
sion for the shot noise breaks down for the highest-mass 
halos. This sub-Poissonian behavior of the shot noise at 
high masses has been found in simulations before [13, ■ 

Moreover we find both negative and positive elements 
in the off-diagonal parts of the shot noise matrix. In 
the case of N halo bins, in total there are N{N + l)/2 
independent elements, since Cij is a symmetric matrix. 
These are composed of N diagonal and N{N — l)/2 off- 
diagonal elements. Hence, in the case of N — 10, there 
are 45 off-diagonal elements and we find 33 of them to 
be positive (dashed lines), while 12 are negative (dotted 
lines). While all off-diagonal elements are white noise 
like, i.e., scale independent, most of the negative com- 
ponents have a higher magnitude than the positive ones. 
The former correspond to the cross correlations of any 
given halo mass bin with the highest-mass halos. 

This finding is rather surprising, because shot noise 
cross correlations are usually being neglected. Since there 
appear to be negative off-diagonal components in the shot 
noise matrix and their magnitude exceeds the positive 
ones, one might expect that a suitable linear combination 
of the halo bins can reduce the total shot noise, as found 
in 0. In the subsequent section we will show that this 
expectation is indeed fulfilled. 

B. Eigensystem of the shot noise matrix 

In order to find the principal components of the shot 
noise matrix we have to diagonalize it by determining its 
eigenvalues A*-'' and eigenvectors defined via 

^C,yf = A(')K<r (9) 
j 

The superscript (1) is used to enumerate the eigenvalues 
and eigenvectors, while the subscripts i and j refer to the 
components of the vectors and matrices. We use routines 
from [24] to do the calculations. 

1. Eigenvalues 

The left panel of Fig. |3] shows the eigenvalues A''-* of 
the shot noise matrix from Fig. [2] for the 10 halo bins as 
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FIG. 2: Elements of the shot noise matrix as defined in Eq. @ 
with 10 halo mass bins. Most of the diagonal components 
(solid lines with stars) agree with Poisson white noise, i.e., 
Cii = l/fii (dotted black line on top), except the highest- 
mass bin which is clearly suppressed (solid black line). There 
are both positive (dashed lines with circles, scaled in red) 
and negative (dotted lines with triangles, scaled in blue) off- 
diagonal components. 



a function of k. The eigenvalues are computed separately 
for every fc-bin and then ordered by their magnitude. As 
apparent from the figure, we find two eigenvalues to differ 
significantly from all the others. One of them is enhanced 
by roughly a factor of 1.5 and one is suppressed by a 
factor of about 2.5 compared to the other ones that lie 
close to the value 1 /fii . The spread of the curves at low k 
is likely due to the low number of modes available there, 
making the eigenvalue determination inaccurate. 

This result reveals the fact that one of the eigenvec- 
tors, which represents a particular linear combination of 
the halo mass bins, yields a very low shot noise level. 
This shot noise level is determined by the lowest eigen- 
value of the shot noise matrix, which we will denote as 
A~. Increasing the number of halo bins we find an even 
stronger suppression of A~ compared to the expectation 
of 1/n, (see Sec HVC]) . 

The other eigenvalue that differs from the value 1/hi 
represents the highest shot noise level. We designate this 
eigenvalue A+ . Since it does not carry much information 
(see below) we do not investigate it further in this paper 
beyond noticing that it is likely to be connected to the 
second-order bias. We note that had we investigated the 
halo covariance matrix (diSj), we would not have been 
able to reveal the lowest eigenvalue as cleanly, because it 
would have been swamped by sampling variance. Indeed, 
previous work focused its attention mostly on the largest 
eigenvalues of (SiSj) [25j . 
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0.01 0.10 10^" 

k[hMpc"^] M[h''M®] 

FIG. 3: The 10 eigenvalues (left) and eigenvectors (right) of the shot noise matrix from Fig. [2] in corresponding colors. The 
black dotted line shows the value l/fii. The eigenvectors are averaged over the entire fc-range. 



2. Eigenvectors 



Every eigenvector y/'' is a function of the wave num- 
ber, just like the eigenvalues. However, as can be seen 
from Figs. [5] and 131 over a reasonable range of wave num- 
bers this dependence can be ignored and we average the 
eigenvectors over the entire /c-range. We also divide each 
eigenvector by its length \Vi\ = V^^y^^ to normalize 
it. 

The right panel of Fig. [3] displays the 10 eigenvectors 
corresponding to the 10 eigenvalues in the left panel. 
Each component of an eigenvector corresponds to one 
halo mass bin. Since we have equal number densities per 
bin, the mass range per bin gets wider with increasing 
mass due to the rapid decline of the halo mass function. 
Every data point in the figure is plotted at the respec- 
tive average mass of each halo bin. Only one eigenvec- 
tor shows exclusively positive components, while at least 
one negative component can be found in the remaining 
eigenvectors. It is this eigenvector that corresponds to 
the lowest eigenvalue A~ and we will denote it as V^^ . 
Its components continuously increase with mass. The 
eigenvector corresponding to the largest eigenvalue 
A"*" also shows a monotonic behavior, but its components 
decrease with mass and turn negative at the high-mass 
end (similarly to the second-order halo bias with an op- 
posite sign). 

One can think of the eigenvectors as weighting func- 
tions for the halos, since each component acts as a weight 
for the associated halo mass bin. Hence, the weighted 
halo density field (5^(x) in configuration space can be 
written as a weighted sum over the halo mass bins, nor- 
malized by the sum of the weights. 



We specifically want to investigate the eigenvector V~ , 
since it yields the lowest eigenvalue of the shot noise ma- 
trix, A~. In Fig. 2] we plot the components of V~ in a 
log-log plot to investigate this eigenvector in more de- 
tail and compare to the results with 30 and 100 mass 
bins. The components of V~ increase linearly with mass 
above M ~ IO^^/i^^Mq, while at lower masses the slope 
tends to become shallower. We compare this eigenvector 
to two different smooth weighting functions for the halo 
density field. The first weighting function simply takes 
the halo mass as a weight for each halo, w{M) = M, we 
will denote it as linear mass weighting. However, as ap- 
parent from the dotted lines in Fig. |4j it only matches the 
components of V~ at high mass. In order to account for 
the saturation effect at low masses, we consider a second 
weighting function that mimics this behavior. 



w{M) = M + Mn 



(11) 



(10) 



The free parameter Mq determines the shape of this 
weighting function, it specifies the mass threshold where 
the saturation sets in. For M ^ Mq, Eq. pT|) approaches 
uniform weighting, whereas in the limit M 3> Mq it 
matches linear mass weighting. We call this weighting 
scheme modified mass weighting, it is shown as a dashed 
curve in Fig. 2] and obviously provides a much better fit 
to V^-^ than linear mass weighting. The fit is shown for 
each case of our mass binning. The best-fit value for 
Mq increases with the number of bins and in the case of 
100 mass- weighted bins becomes Mq ~ 1.7 x 10^^h~^MQ. 
Note that for visibility reasons this eigenvector is shifted 
downwards by a factor of 2 in the plot. 

Similar weighting schemes have already been applied 
to the halo density field in where a significant re- 
duction of the stochasticity between halos and the dark 
matter could be achieved. In particular, a trial weight- 
ing function also denoted as modified mass weighting was 
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FIG. 4: The normalized eigenvector V~, corresponding to 
the lowest shot noise eigenvalue A~, computed for 10, 30, 100 
uniformly weighted bins and 100 mass-weighted bins (from 
top to bottom) . The latter was shifted downwards by a factor 
of 2 for visibility. The dotted (blue) and the dashed (red) lines 
represent linear and modified mass weighting, respectively. 
The best-fit values for Mo are given in the bottom right for 
the respective cases. 



shown to improve on linear mass weighting. However, in 
that paper the functional form was found empirically and 
was not demonstrated to be optimal. In this work we 
show why modified mass weighting as defined in Eq. (jlip 
is the optimal weighting to suppress the stochasticity in 
halos: in the limit of many halo mass bins it converges to 
the components of V~ , the eigenvector of the shot noise 
matrix with the lowest eigenvalue. What remains to be 
shown is what determines the value of Mq: we will ar- 
gue it depends on the lower boundary of the halo mass 
function considered. 

These results also justify why applying linear mass 
weighting to the halos within each bin leads to a better 
convergence of the eigenvector towards a smooth weight- 
ing with infinitely many bins: linear mass weighting al- 
ready reduces the stochasticity of each bin as compared 
to the uniformly weighted case (see @). The resulting 
eigenvector is then determined more accurately, corre- 
sponding to an effectively higher sampling with more 
bins. This mainly has an effect on the highest-mass 
bins, since they have the broadest range in mass. At low 
masses the bins are increasingly narrow, so uniform and 
linear mass weighting become increasingly similar within 
a given bin. 

Other attempts to find an optimal weighting scheme 
for the halo (galaxy) density field found the halo bias to 
yield the best constraining power on dark matter statis- 
tics and cosmological parameters when used as weight- 
ing function |25| - |27| . We have tried using only b(AI) as 
a weighting function, but found less suppression in halo 
stochasticity as compared to modified mass weighting. 



C. Signal-to-noise 

While we have demonstrated that it is possible to sup- 
press the stochasticity of a given halo density field using 
a single linear combination of halo bins, it remains to be 
shown how the information content in this single eigen- 
mode compares to the complete information content. We 
cannot answer this question in general, since it depends 
not only on the property of the shot noise matrix, but 
also on derivatives of the halo density field with respect 
to the cosmological parameters one wants to estimate. 
Those two ingredients depend on halo mass and deter- 
mine the Fisher information content of the halo density 
field. We will not explore the general case here and in- 
stead focus on the simple case where the information con- 
tent is expressed via the ratio of the autopower spectrum 
to the shot noise of a particular tracer (signal-to-noise 
ratio per mode). Its inverse appears in Eqs. 1^ and ([3]). 
We compute it for the weighted halo density field. 



N 



p 



1 



(12) 



Pw denotes the autopower spectrum of the weighted halo 
density field, its shot noise, the corresponding 
weighted bias and the cross-correlation coefficient 
between the fields 6^ and Sm as defined in Eq. ([5]). The 
weighted bias can be computed from the halo bins via 



bw = 



(13) 



It is clear from this expression that in order to maximize 
the signal, the eigenvector components should all be of 
equal sign, since otherwise different halo bins cancel each 
others signal. Using Eq. to express ct^ in terms of the 
weighted density field S.^j and Eq. ((TO)) for the definition 
of Sw (we omit the superscripts for clarity), we have 



{{Sw - bw5,nY) = { 



S.,. ^^^^ 



(14) 



where ^iVj^ = ^ case the eigenvectors are normalized. 
Hence, the signal-to-noise ratio becomes 



N 



-P — 



Prr 



(15) 



Note that this is only the signal-to-noise ratio for one par- 
ticular weighting of the halo density field, corresponding 
to one eigenmode of the shot noise matrix. The com- 
plete information content of the halos is calculated by 
summing up all N contributions, 
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Prr 



(16) 
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However, since we find one very low eigenvalue of the shot 
noise, most of the signal will be contained in the halos 
weighted by Vf' . Adding up the denominators of Eq. 
and taking the inverse yields the total noise contribution 
of the halos. We call it the reduced shot noise, 



(17) 



1.00 



(0^2 



Alternatively, the signal-to-noise ratio of the halo den- 
sity field can be derived from a distribution. Since the 
modes of the halo bins are assumed to be independent, 
normally distributed variables, the expression 



biSm) Cj {Sj - bjSrn) 



(18) 



follows a distribution. Here, C~j^ refers to the i- 
j component of the inverse shot noise matrix and the 
index k connotes a summation over all Fourier modes. 
The derivative of the distribution with respect to 
the inferred dark matter density field 6m must vanish, 
9xV(9<5m = 0. This yields 



(19) 



Here, the vector '^^C^^^bi conducts a weighting of the 
halo mass bins again. The difference to the weighting 
with one particular eigenvector of Cij is that this vector 
contains the complete information of all eigenmodes and 
thus provides an optimal estimator for the dark matter 
density field. However, as can be seen in Fig. [51 it has 
a very similar shape as V~ and modified mass weighting 
provides an equally thorough fit to this vector. The only 
difference is a slight increase in the best-fit value for the 
parameter AIq. 

The second derivative of the distribution leads to 
the signal-to-noise ratio of the halo density field. 



S _ S 



N 2Nk dSl 



(20) 



This expression is equivalent to the Fisher information 
on the dark matter density mode Sm- Here, the reduced 
shot noise is simply computed as 



1 



(21) 



We first show the l^^'^'-weighted bias for the 10 halo 
bins in the left panel of Fig. [51 The highest bias, b^ — 2, 
is achieved by weighting with V~ , as expected, since its 
components are all positive and give the largest weight to 
the highest halo masses. All the other eigenvectors pro- 
duce lower values of the bias, distributed around unity. 
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FIG. 5: The normalized vector "Y^^ ^ij^bi that provides an 
optimal estimator for the dark matter, computed for 100 uni- 
formly weighted mass bins. Since this vector is very similar 
to V~ , modified mass weighting (dashed red line) still yields 
a reasonable fit with a slightly higher value of Mo (bottom 
right). 



Note, however, that the weighted bias alone is not suf- 
ficient to describe the complete information content of 
the weighted halo density field. It is given by the signal- 
to-noise ratio in Eq. (|15|) . which contains the weighted 
bias, the eigenvalue and the sums over the eigenvector 
components. 

The total signal-to-noise ratio of the 10 eigenmodes is 
shown in the right panel of Fig. [6l This panel also shows 
the sum of all signal-to-noise ratios of each eigenmode, 
i.e. Eq. (I16p . and the signal-to-noise ratio as defined in 
Eq. (I20p as a cross-check. Clearly, the weighting with 
V[' dominates the signal-to-noise ratio. The eigenvector 
corresponding to the largest eigenvalue yields the sec- 
ond largest contribution, which may appear surprising 
since its effective bias is around unity and its eigenvalue 
is by far the largest. However, the sum of this eigenvec- 
tor's components is large, which makes its signal-to-noise 
ratio dominant in comparison to the other eigenmodes. 
Still, it is suppressed by roughly 1 order of magnitude 
compared to the weighting with V~ and can be safely 
ignored. This fact can be cross-checked when we com- 
pare the vector ^^C^j^bi appearing in Eq. to . 
We found no mentionable discrepancy between the two. 
Thus, the main conclusion from this analysis is that the 
lowest eigenvalue contains most of the information and 
the other eigenmodes can be neglected. 

So far we explored the signal-to-noise ratio of only 10 
eigenmodes, a relatively sparse mass binning of the halo 
density field. Do these results converge with increasing 
the number of halo mass bins? It is interesting to plot the 
inverse signal-to-noise ratio, since it appears in Eq. ^ 
and thus determines the relative error on the power spec- 
trum. We display it in the left panel of Fig. [71 where the 
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FIG. 6: Weighted bias (left) and signal-to-noise ratios (right) of the 10 ^/'-''-weighted halo density fields. Colors correspond 
to the eigenvalues and eigenvectors of Fig. [3] The sum of all 10 signal-to-noise ratios is plotted as a dotted (blue) curve. The 
dashed (red) curve shows the signal-to-noise ratio as defined in Eq. (|20|) . 



results for different numbers of halo bins are presented. 
Increasing the number of halo bins improves the signal- 
to-noise ratio. In the limit of a large number of bins 
this should be equivalent to applying the smooth weight- 
ing function we found from the eigenvector V~ (modified 
mass weighting) to each halo individually. We thus com- 
pute the inverse signal-to-noise ratio from the smoothly 
weighted halo density field, defined as 



J^(x) 



E,^^(M,)^,(x) 



(22) 



In practice, for every halo of mass M in the simulation we 
assign a weight w{M) according to the smooth weighting 
function of Eq. Summing over all such weighted 

halo overdensities and normalizing yields the smoothly 
weighted halo density field 6w 

Its inverse signal-to-noise ratio results in the lowest 
(solid black line) curve in the left panel of Fig. [71 We 
also compare to the case when we assume all off-diagonal 
elements of the shot noise matrix to vanish (long dashed 
blue line) . Clearly, a lot of information is lost when doing 
so and any improvements compared to uniform weight- 
ing (dotted black line) are canceled: we find roughly a 
factor of 5 improvement in the best case compared to 
uniform weighting. We optimized the value for Mq by 
iteration to reach a minimal shot noise level and find 
Mq ~ 3.4 X W^^H'^Mq, larger than our best-fit values 
for the highest resolution eigenvector from Fig. 2] and the 
vector from Fig. [SJ This is expected, since we only tested 
up to 100 halo mass bins and have not fully converged 
yet (Mq increases with the number of bins). However, 
the results from 100 mass- weighted halo bins closely ap- 
proach the results obtained from smoothly weighting the 
halo density field (solid black lines in Fig. [T]). 

Looking at the reduced shot noise, we see a similar be- 



havior. The right panel of Fig.[7]shows the same improve- 
ments when accounting for the off-diagonal elements of 
the shot noise matrix and increasing the number of bins. 
The shot noise of the halo density field can drastically be 
reduced using the appropriate weighting, on average by a 
factor of 4 in this case. Since the bias increases with our 
weighting, the improvement in the inverse signal-to-noise 
ratio is more striking, though. 

This is well in agreement with the results in jBi] , where 
we applied linear and a different kind of modified mass 
weighting to halo density fields with different abun- 
dances. The modified mass-weighting function we ap- 
plied there was rather a trial function that happened to 
suppress the shot noise better than linear mass weight- 
ing, and we did not derive it via any formal procedure 
like we do here. In order to directly compare to these 
older results, we apply modified mass weighting to one of 
the simulations presented in that paper. In particular, we 
use the simulation with 1024^ dark matter particles and 
a mean halo number density of n ~ 7.0 x 10^^/i'^Mpc~'^, 
resolving halos down to Mmin ~ 5.9 x lO^^/i'^M©. This 
yields the inverse signal-to-noise ratio and the reduced 
shot noise presented in Fig. [S] We also show the results 
from the binned halo density field, as before. Note that 
the strong decline of the curves corresponding to 100 bins 
is likely due to noise at low k. It is the same effect present 
already in our first simulation, however it is pronounced 
here, since the number of modes per fc-bin is lowered by 
a factor of 6. Compared to the best case shown in Fig. 2 
of i, we managed to further reduce the inverse signal- 
to-noise ratio by an additional factor of 2. 

The overall improvement compared to uniform weight- 
ing is even more striking, about a factor of 10 on average 
in signal-to- noise and roughly a factor of 4-5 in shot noise. 
Hence, owing to the higher mass resolution of this Simula- 
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FIG. 7: Inverse signal-to-noise ratio (left) and reduced shot noise (right) of the halo density field, sliced into 10 (dot-dot-dashed 
green line), 30 (dashed red line), 100 uniformly weighted (dotted orange line) and 100 mass- weighted (dot-dashed yellow line) 
mass bins. The upper curves (long dashed blue line) show the results when neglecting the off-diagonal elements of the shot 
noise matrix. They agree well with uniform weighting (dotted black line, left panel) and the value 1/n (dotted black line, right 
panel), respectively. The lowest curves (solid black line) display the results obtained from weighting the halo density field with 
the smooth function ■w{M) — AI + Mq. 
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FIG. 8: Same as Fig. [T] computed from our higher resolution simulation with 1024^ particles and a mean halo number density 
of n ~ 7.0 X 10~''ft^Mpc"^, resolving halos down to Mmin 5.9 x 10^^/i"^Mq. 



tion, we include many more low-mass halos and therefore 
roughly double the signal. Via iteration we find the value 
Mo ~ 1.7 X lO^^ft'^Mg to yield the lowest shot noise 
level. Comparing to the value we found in the previous 
simulation, Mq ~ 3.4 x lO^^h'^MQ, it is roughly half 
as large. In our lower resolution simulation the lowest 
halo mass we can resolve is Afmin ~ 1.1 x 10^^h~^MQ, 
while in the higher resolution simulation it is Mmin — 
5.9 X IQi^/i-i Mq. 

The decline of Mq with the increase in the resolved 
halo mass fraction is expected, since one needs to ac- 
count for the unresolved halos in the simulations. The 
relation between Mq and M^i^ should be monotonic; in 



the limit of perfect mass resolution we would expect all 
the dark matter to be in halos of a certain mass. Weight- 
ing all these halos by their mass should then recover the 
statistics of the dark matter density field without shot 
noise, at least on large scales. Within the tested domain 
of our simulations we find the relation Mq ~ SMy^in to 
be a good approximation in order to determine the ap- 
propriate choice for Mq given Mmin- 
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FIG. 9: Inverse signal-to-noise ratio (left) and reduced shot noise (right) of halos with a log-normal mass scatter of ainM = 
0, 0.1, 0.3, 0.5, 0.8 -5> 0.4 and 1.0 (bottom to top), weighted by u;(M) = M + Mq. The dotted (black) Unes show the results 
from uniform weighting. 



D. Mass uncertainty 

Up to now we have always been assuming to precisely 
know the mass of each halo (up to the sampling variance 
of the halo finder). However, in realistic observations 
the halo mass can only be determined with a limited 
accuracy. Commonly, the uncertainty is expressed as a 
log-normal scatter in halo mass. We can mimic this un- 
certainty by adding a Gaussian random variable G with 
zero mean and unit variance, scaled by crinA/i to the ex- 
ponent of the mass, 

M = M exp(r7in m G - crfn Af /2) • (23) 

This yields the noisier mass M, which then follows a 
log- normal distribution. The value i7inj\/ is the log- 
normal scatter and the term is subtracted to 
maintain the same mean. For optical tracers of clus- 
ters CTinM is about 0.5 [11] and is expected to be much 
lower for SZ or X-ray proxies, such as Yx [13 ■ At the 
lower mass end the log-normal scatter is, however, poorly 
constrained. For simplicity, we will consider a constant 
log-normal scatter for all halos and apply the values of 
cinM = 0.1, 0.3, 0.5, 1.0. As a more comphcated model 
we vary the scatter linearly with mass, with a\nAi = 0.8 
at M = IO12/1-IM0 and m^M = 0.4 at M = lO^^h-^MQ 
(abbreviated as crinM = 0.8 — 0.4). 

We again apply modified mass weighting to construct 
the smoothly weighted density field as in Eq. (1^^ and 
compute the inverse signal-to-noise ratio as well as the 
reduced shot noise from it. For each case we adjust 
the value for Mq in the weighting function separately 
by iteration. The results are depicted in Fig. [SI When 
using modified mass weighting, a 50% log-normal scat- 
ter in halo mass still yields about half the shot noise 
level of what is expected from uniform weighting. Even 
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FIG. 10: The normalized eigenvector computed for 100 
uniformly weighted bins with a log-normal scatter of ain m ~ 
0.5, 0.8 — > 0.4 and 1.0 added to the halo masses (top to bot- 
tom). For visibility, the lower two eigenvectors are shifted 
downwards by a factor of 2 and 5. The dotted (blue) and the 
dashed (red) lines represent linear and modified mass weight- 
ing, respectively. 



our model with linearly decreasing fiinM from 0.8 to 
0.4 and the high value of (JinM — 1-0 yields inverse 
signal-to-noise ratios and shot noise levels that are be- 
low common expectations. The optimal values for Mq 
increase with higher mass scatter. We find Mq ~ 3.5 x 
10", 5.1 X 10", 9.0 X 10", 4.4 x IO^/i-^Mq for the 
cases of cinA/ = 0.1, 0.3, 0.5, 1.0. 

Figure [10] shows the resulting eigenvector Vf when 
applying a log-normal scatter with txinM = 0.5, 0.8 — > 
0.4 and 1.0 to the halo masses. In this case we only 
present it with uniformly weighted bins, since due to the 
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scatter, mass weighting the bins does not improve on 
the resuhs. Clearly, the saturation effect at low masses 
is more pronounced the stronger the scatter, resulting 
in an increase of the value for Mg. Also the halo mass 
range becomes wider. However, the smooth function for 
modified mass weighting still fits the data well. The only 
impact that mass scatter has on the eigenvector V~ is to 
raise its saturation tail and thus the value of Mq. This is 
even the case for our model of linearly varying ain m with 
mass: modified mass weighting still provides a reasonable 
fit to the simulation data. 



V. HALO MODEL APPROACH 

In order to interpret our results, let us consider the halo 
model [111, [30l - l36j . The basic assumption of this model 



Here we assume the large-scale limit for the halo profile, 
i.e., u{k — )• 0, Af) = 1. Moreover, denotes the Kro- 
necker symbol and 9 a product of two Heaviside step 
functions 'd: 

e(A/, Mi) = d{M - Mi)d{M,+i - M) . (30) 

Since the integrals all go from to oo, this function selects 
the considered halo bin i with mass range Mi < M < 
Ali^i. The corresponding average halo mass of that bin 
is simply denoted as Mi, whereas for an average over all 
halos we omit the index. 

In the simple approach adopted here the halo model 
predicts a white noise term not only for the autopower 
spectrum of halos, but also for the halo-matter cross- 
and the matter autopower spectra. However, simulations 
have shown that the low-fc behavior of the dark matter 
one-halo term is incorrect: subtracting off the compo- 
nent correlated with the linear power spectrum [which 
is approximately Piin(fc) at large scales] from the sim- 
ulated one indeed yields a fc*-scaling instead of a con- 
stant white noise in the residual power (mode-coupling 
power) [il, [13] ■ This fc^-tail is a consequence of local 
mass and momentum conservation of the dark matter on 



is that the power spectra of either dark matter or halos 
can be written as the sum of a one-halo term and a 
two-halo term P^^. The former describes the clustering 
of substructure within one single halo, whereas the latter 
represents the clustering among different halos. More- 
over, it is assumed that all the dark matter is confined 
within virialized halos. The two terms can be expressed 
analytically via the halo mass function j§{M), the nor- 
malized Fourier transform of the halo profile u{k, M), the 
analytic bias b{M), the linear power spectrum Piin(fc), 
and the mean density of dark matter pm- Considering 
all the possibilities of auto- and cross-power spectra be- 
tween halos in distinct mass bins i and j and the dark 
matter, the halo model for uniform weighting yields: 



(24) 
(25) 
(26) 
(27) 
(28) 
(29) 

I 

small scales and the same conservation laws should also 
apply to halo-matter correlations. We defer further dis- 
cussions of this point to a future publication, where we 
show that a proper implementation of mass conservation 
in the fc = limit still yields similar results to those 
presented here. 

The shot noise matrix as defined in Eq. (U) can be 
written as 

= (6,5,) - hiSjSra) - b, {6,6m) + b,b, (J^J . (31) 

Plugging in the sum of the corresponding one- and two- 
halo terms for each of the angled brackets, we see that 
the two-halo terms cancel each other and we are left with 

+ (32) 

nt Pm Pm Ptn 

Our lower resolution simulation determines the dark 
matter one-halo term to be {nM'^)/p1^ ~ 428/i~'^Mpc'^. 
Note that this value is by almost 2 orders of magnitude 
smaller than the first term in Eq. ([5^ . However, for 
highly biased halo bins it can become important in the 
off-diagonal terms of C^ . The same applies to the one- 



= ?nb // ^{M)i^{M')b{M)b{M')Q{M,M,)Q{M',M,)PUk)dMdM' = hb,PUk) 
= ^{M)^{M')b{M)b{M')M'Q{M,M,)PUk)dMdM' = b^P^k) 

Pmmik) ^±11 ^{M)^{M')b{M)b{M')MM'PUk)AMAM' = PUk) 
Pl^ik) = -1- / ^iM)e{M, M,)e{M, M,)dM = -i-^^ 
P/H(fc) ^-^J ^{M)MQ{M, M,)dM = £ 
Pil.ik) ^±1 ^{M)MMM ^ ^ . 
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halo term of the halo-matter cross-power spectrum, be- 
cause it scales with the mean halo mass of each bin. For 
example, it yields Mi/p^ ~ 164/i~'^Mpc'^ for the low- 
est, and Mio/pm ~ 239Ah~^Mpc^ for the highest of our 
10 mass bins. In Fig. [TT] we compare each matrix el- 
ement of Eq. ((32|) to the numerically determined shot 
noise matrix (from Fig. [2]). The model yields a good 
agreement with the data, especially the observed sub- 
Poissonian shot noise power of the highest-mass halos, as 
well as the negative off-diagonal components are nicely 
reproduced. The off-diagonal elements with low power 
are more affected by scatter and therefore show stronger 
deviations from the theory. 

For the comparison of our model to the numerical data 
it is, however, more convenient to look at the eigenvec- 
tors and eigenvalues of the shot noise matrix, since they 
describe the complete information on halo stochasticity 
in a more concise manner. Let us redefine the halo mass 



as 



{nM 
2p„ 



/f2\ 



Now, Eq. ([32l) can be written more succinctly: 



i - 

Pm 



(33) 



(34) 



It is straightforward to work out the eigenvalues and 
eigenvectors of this matrix. For d > 2 mass bins, there 
are d—2 degenerate eigenvalues with the value A = 1 /n^ . 
The two remaining eigenvalues with corresponding eigen- 
vectors are 



Mi 



(35) 



(36) 



They are shown in Fig. [T^] for the case of 10 halo bins. It 
is remarkable how well the halo model reproduces the dis- 
tribution of eigenvalues we found in our numerical analy- 
sis. The mass dependence of the eigenvectors also shows 
a good agreement. This can be seen when we renormalize 
Vf^ by multiplication with y^J^i Mf in Eq. (IMl) . we get 



with 



± 



2prn 



(37) 



(38) 



In other words, is nothing else than a superposi- 
tion of mass and bias weighting. The relative weight 
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FIG. 11: Elements of the shot noise matrix as described by 
the halo model in Eq. (|32|) . compared to the simulation results 
taken from Fig. [2] (symbols). The diagonal components (solid 
lines) monotonously decrease from the lowest (yellow) to the 
highest-mass bin (black), in good agreement with the numer- 
ical data. The halo model also reproduces both the positive 
(dashed lines, scaled in red) and the negative (dotted lines, 
scaled in blue) off-diagonal elements of the shot noise matrix 
fairly well. 



between the two is determined by . Equation ([57)) 
has a very similar form as the modified mass-weighting 
fitting function from Eq. ([TT]) . Evaluating Eq. ([55]) using 
bi and Mi from the simulation with 10 mass bins yields 
Mq" ~ 1.2 X lO^^/i-^M©. At the high-mass end, i.e., 
M ~ lO^^/i-^M©, the second term in Eq. ^ is neg- 
ligible compared to the first one, since bi ^ 10 in this 
regime. However, at lower masses the two terms become 
closer in magnitude and finally the second term domi- 
nates at the low-mass end, i.e., M ~ lO^"^ft.~^M0. In 
this regime the bias is a slowly varying function of mass 
and thus well approximated by a constant. Hence, the 
analytical form of V~ predicted by the halo model agrees 
well with the functional form for modified mass weighting 
that we found earlier. 

In order to check our model more quantitatively, we 
compare its predictions directly to our numerical results 
in Fig. 1121 Here we focus on the nontrivial eigenvalues 
and eigenvectors V^^, since only they contain information 
on the halo statistics. The agreement between simulation 
and theory is remarkable, only for the eigenvalue A"*" we 
find a stronger discrepancy, but since it shows a slight 
scale dependence it probably involves more detailed mod- 
eling. We did the same comparison for the case of 30 and 
100 mass bins and find the agreement in the eigenvectors 
to become even better. The offset in A+ however does 
not vanish with an increasing number of bins. 

One might argue that the way we estimate the bias 
from the simulation in Eq. ([7]) is not correct in this ap- 
proach, since the halo model predicts a nonzero white 
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FIG. 12: The eigenvalues (left) and eigenvectors (right) from Fig. [3] compared to the predictions of the halo model 
(dashed red line). The dotted line in the left panel shows the value 1/ni. 



noise term for both the halo-matter cross, as well as the 
dark matter autopower spectrum. We repeated the same 
analysis with a shot noise corrected halo bias defined as 



{SiS„ 



((52 



(39) 



However, we find essentially no difi^erences in the shot 
noise matrix and its eigenvalues and eigenvectors. As 
can be seen in Eq. ([3ll . this is because small changes 
in the bias are compensated by terms of opposite sign. 
For the same reason it does not matter much whether we 
use the scale-dependent or scale-independent bias in our 
analysis. 

Another way to compare our model to the simulations 
is to look at the estimators for the halo bias itself, 



(SI) 



and 
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(40) 



Since the halo model yields white noise terms for all three 
correlators appearing in these estimators, this can partly 
account for their scale dependence. We get 
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(41) 
(42) 



where we take bi as an average over large scales from 
Eq. (1391) or compute it from the halo mass function via 
the peak-background split formalism . The simulation 
results for both estimators are shown in Fig. [T3] for the 
case of 10 mass bins. The halo model reproduces the nu- 
merical results very well up to scales of /c ~ 0.1 /iMpc""'^. 



The deviation on smaller scales is expected, since higher- 
order bias effects, the nonlinear evolution of the density 
field [H, 113 , and the detailed shape of the halo profile 
begin to matter [33| . 

The figure also shows the result of the two estima- 
tors when accounting for all of the halos in the simula- 
tion. In the case of uniform weighting (dashed lines) 
they both agree on large scales, but show a different 
scale dependence towards higher fc-modes. With mod- 
ified mass weighting (dotted lines) however, both esti- 
mators agree even up to smaller scales, a consequence of 
the small stochasticity in this estimator. Note that for a 
weighted field we need to account for the weights in the 
averaged quantities, so in Eqs. (|4T|) and (|42|) we have to 
exchange Mi by the weighted mean halo mass M„,, bi by 
the weighted bias and l/fii by 1/nx {w"^) / {w)'^ , with 
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where we integrate over all resolved halo masses. 

Last but not least we can utilize the halo model to de- 
termine the reduced shot noise as a function of mass reso- 
lution. For this we need analytic expressions for the halo 
mass function j^{M) and the halo bias b{M) to com- 
pute the eigenvalues and eigenvectors of the shot noise 
matrix. We use the functional forms of Sheth-Tormen 
with the parameters given in [3^ . For infinitesimal bins, 
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FIG. 13: Scale-dependent bias estimators from halo-matter cross correlation (left panel) and halo-auto correlation (right panel) 
as predicted by the halo model. The simulation results from 10 mass bins are shown as crosses with error bars (in color); the 
solid lines show the halo model results. The black dots with error bars show the results for only one mass bin (all halos) for 
both uniform and modified mass weighting, with the halo model prediction overplotted in dashed and dotted, respectively. 



Eqs. (|35|) and ((36|) can be rewritten as 
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The integrals run from Mmin to oo and we can compute 
the reduced shot noise of the weighted halo density 
field for various values of A/min using Eqs. (fT4|) and (fT7| 
in their infinitesimal form: 



1/^(M)T/^(M) dM 



(53) 



We neglect all eigenmodes except and for this 
calculation, since they have the largest contribution in 
signal-to-noise. This yields 
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(54) 



The result can then be compared to the expected shot 
noise from uniform weighting, which, according to the 
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FIG. 14: Stochasticity between halos and the dark matter as 
a function of mass resolution as predicted by the halo model 
for the cases of uniform (cr^, solid red line) and modified 
mass weighting (crj?, dashed green line). The dotted (blue) 
line shows the Poisson prediction 1/n. The results from our 
highest resolution simulation are overplotted as red diamonds 
(uniform weighting) and green circles (modified mass weight- 
ing) for five different low-mass cuts. The black crosses show 
the corresponding values of 1/n taken from the simulation. 



halo model, is given by 



1 M 
'jI{M^,^) = - -26— 

n pra 



(55) 



where n, M and b depend on M„iin and can be computed 
from Eqs. and (|T7)) using uniform weights, i.e. 

w{M) — 1. The functions cr2(M,„i„) and a'^{Mmin) are 
depicted in Fig. [TH Apparently, at low resolution (high 
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-^inin), the improvement due to modified mass weight- 
ing is quite modest. However, for Af,„i„ ^ 10^^/i~^Mo 
the function CT^(Mmi„) approaches a constant, while 
(T^(Minin) still decreases linearly with M„ii,i. This linear 
trend leads to a suppression of stochasticity by almost 
2 orders of magnitude if one can resolve halos down to 
M„,i„ = lOio/i-iM0. 

In order to cross-check these results we computed the 
fc-averaged shot noise (shown as filled symbols) for var- 
ious low-mass cuts from our highest resolution simula- 
tion consisting of 1536"^ particles resolving halos down 
to Mmin ^ 9.4 X 10"/i~^Mo. Taking into account 
all halos in this simulation we obtain a minimal shot 
noise level when applying modified mass weighting with 
Mq ~ 3.1 X lO^^ft,~^M0, which again satisfies the antici- 
pated relation Mq ~ 3Mniin. 

Overall, the agreement between the simulations and 
the halo model is reasonable, but not perfect. At low 
-^min the halo model underestimates the shot noise of the 
uniformly weighted halo density field and the shot noise 
suppression due to modified mass weighting relative to 
uniform weighting is even larger than predicted by the 
model. At high M,„in the shot noise in the simulation 
is not perfectly scale independent anymore and since we 
are taking the average over the whole fc-range the result 
becomes more inaccurate. 



VI. CONCLUSIONS 

In a previous paper [6] it was shown that weighting 
dark matter halos by their mass can lead to a suppres- 
sion of stochasticity between halos and the dark matter 
relative to naive expectations. In this work we investi- 
gated the shot noise matrix, defined as the two-point cor- 
relator Cij = {{6i — hi6m)(5j — bjSm)) in Fourier space, 
split into equal number density mass bins. The eigensys- 
tem of this matrix reveals two nontrivial eigenvalues, one 
of them being enhanced, the other suppressed compared 
to the Poisson model expectation. It is the latter that 
leads to a reduced stochasticity. The optimal estimator 
of the dark matter and the eigenvector corresponding to 
the lowest eigenvalue are very similar and the latter dom- 
inates the signal-to-noise ratio of the halo density field. 
We fit both vectors by a smooth function of mass which 
we denote modified mass weighting. It is proportional to 
halo mass at the high-mass end and approaches a con- 
stant towards lower masses which is determined by the 
minimum halo mass resolved in the simulations. This 
constant is roughly 3 times the minimum halo mass over 
the range of masses we explored. 

Applying this function to weight the halo density field 
results in a field that is more correlated with the dark 
matter with a suppressed shot noise component, improv- 
ing upon previous results Q by a factor of 2 in signal-to- 
noise. We investigate the effect of uncertainty in halo 



mass, finding that it does not change our fundamen- 
tal conclusions, even if it weakens the strength of the 
method: a realistic amount of log-normal scatter in mass 
at the level of 0.5 increases the shot noise by a about a 
factor of 2. Our results can directly be applied to meth- 
ods that attempt to eliminate sampling variance by in- 
vestigating the relation between galaxies and the dark 
matter both tracing the same LSS. In this case the er- 
ror is determined by the stochasticity between the two 
and reducing it can improve the ultimate reach of these 
methods 0, 

Considering the halo model as a theoretical approach 
to describe the shot noise matrix, we find analytical ex- 
pressions for its eigenvalues and eigenvectors. In partic- 
ular, the two nontrivial eigenvectors can be written as 
a linear combination of halo bias and halo mass, which 
yields a considerable agreement with our simulation re- 
sults. Furthermore, the two estim ators of t he scale- 
dependent bias, {Sh<>m)/{Sm) ^nd v(^I)7(^m): well 
reproduced. However, our model suffers from the lack of 
mass and momentum conservation: its implementation, 
together with higher-order perturbation theory and halo 
exclusion, further improves the agreement and will be 
presented elsewhere. 

The halo model suggests the stochasticity between 
modified mass-weighted halos and the dark matter to 
decrease linearly with mass resolution below M ~ 
lO^^/i~^M0, yielding a suppression by almost 2 orders of 
magnitude at M„iin — lO^'^ h~^MQ as compared to uni- 
formly weighted halos. While we focused on the question 
of how well halos can reconstruct the dark matter, our 
analysis is also applicable to the study of stochasticity 
between halos themselves. Indeed, reducing the stochas- 
ticity between different halo tracers by optimal weighting 
techniques, while at the same time canceling sampling 
variance, should be possible even if the dark matter field 
is not measured. This will be addressed in more detail in 
a future work. 

Specific applications are the best way to test the effi- 
ciency of our method. There is probably not much ad- 
vantage in applying it to the standard power spectrum 
determination, where the sampling variance error domi- 
nates the error budget in the limit of small stochasticity, 
while in the opposite limit of rare halos, when the shot 
noise power is comparable to the intrinsic halo power, we 
do not see much gain (as demonstrated by the fact that 
all the points in Fig. [T4l overlap for the highest M^nin, 
which corresponds to halos with the lowest number den- 
sity). More promising applications are those where the 
sampling variance error is eliminated and the error bud- 
get is dominated by stochasticity, or the ratio of shot 
noise power to the halo power. 

In this paper we have focused on the bias determina- 
tion from galaxy and dark matter correlations 5j as a 
specific application, but other applications are possible, 
such as constraining /nl from non-Gaussianity [13, \^ 
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and the redshift-space parameter /3 from redshift-space 
distortions , to name a few. Upcoming surveys hke 

SDSS-III [421, JDEM/EUCLID [Ulil or BigBOSS [H 
and LSST 46] will increase the available number of galax- 
ies significantly, providing both 3D galaxy maps and 2D 
to 3D dark matter maps (via weak lensing techniques, 
enhanced by lensing tomography). 

Our results suggest that correlating modified halo 
mass-weighted galaxies against the dark matter has the 
potential to lead to dramatic improvements in the pre- 
cision of cosmological parameter estimation. We will 
explore more explicit demonstrations of the above men- 
tioned applications in future work. 
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